Content

指随着模型能力增强，复杂的提示工程策略反而导致性能退化的现象。

Acceptance

Prompting Inversion 指随着模型能力增强，复杂的提示工程策略反而导致性能退化的现象。
2025 年论文 “You Don’t Need Prompt Engineering Anymore: The Prompting Inversion” 在 GSM8K 数学推理基准上验证：GPT-4o（中等能力）使用 Sculpting 提示词达到 97% vs CoT 93%（+4pp 优势）；GPT-5（高能力）同样的提示词降至 94% vs CoT 96.36%（-2.36pp 劣势）。
三种由过度约束引发的错误模式：Hyper-literal interpretation（超字面解读）、Rejection of reasonable inference（拒绝合理推断）、Over-constraint（过度约束限制解题）。
核心机制：过于正式、刚性的提示词构成分布偏移（distributional shift），偏离了模型在对齐训练中接触到的指令类型。
Anthropic 官方在 Claude 4.5/4.6 文档中明确：“Where you might have said ‘CRITICAL: You MUST use this tool when…’, you can use more normal prompting like ‘Use this tool when…’”

来源：2026-03-28 Claude Code 对话（spec-skill reform 任务中的调研）
2026-03-28-bilingual-prompting-research
论文：You Don’t Need Prompt Engineering Anymore: The Prompting Inversion (arxiv 2510.22251)
Anthropic Prompting Best Practices: https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices

这个发现直接影响了我们 skill 系统的约束设计。原本计划为 Claude 4.6 简化所有约束，但考虑到 skill 同时被弱模型（Sonnet、Haiku、MiniMax）使用，最终选择了”调用层分级”方案：主 agent 读完整约束，派发 subagent 时根据模型能力调整注入的详细度。
核心启示不是”不要写约束”，而是”约束的复杂度应与目标模型的能力匹配”。把约束的强度从”语气”维度（ALL CAPS、MUST、NEVER）转到”结构和清晰度”维度（XML 标签、Reason 解释、示例）。